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Problem  Studied 


This  report  surveys  final  progress  in  the  research  project  “A  Scalable 
Parallel  Library  For  Numerical  Linear  Algebra,”  and  constitutes  a  final 
progress  report. 

This  research  project  consisted  of  a  number  of  closely  related  topics 
involving  researchers  at  a  number  of  institutions.  ScaLAPACK  is  being 
developed  at  the  University  of  Tennessee  at  Knoxville,  Oak  Ridge  Na¬ 
tional  Laboratory,  and  the  University  of  California  at  Berkeley.  ScaLA¬ 
PACK  is  a  prototype  library  of  software  for  performing  dense  and  band 
linear  algebra  computations  on  message-passing  computers,  and  also  in¬ 
cludes  out-of-core  linear  solvers,  out-of-core  eigensolvers,  new  ScaLA¬ 
PACK  and  PBLAS  routines  for  packed  storage,  an  HPF  interface  to 
a  subset  of  ScaLAPACK  routines,  SuperLU,  SuperLUJVIT,  and  Su- 
perLUJDIST,  a  spectral  divide  and  conquer  (SDC)  eigensolver  using 
the  matrix  sign  function,  and  ATLAS. 

P_ARPACI<  is  developed  at  Rice  University,  and  is  a  distributed- 
memory  software  package  for  solving  large,  sparse,  nonsymmetric  eigen- 
problems  using  a  variant  of  the  implicitly  restarted  Arnoldi  method. 
CAPSS,  developed  at  the  University  of  Illinois  at  Urbana-Champaign, 
is  a  fully  parallel  package  for  solving  sparse  linear  systems  of  the  form 
Ax  =  b  on  message  passing  computers  using  matrix  factorization.  Re¬ 
searchers  at  the  University  of  Tennessee  at  Knoxville  and  the  University 
of  California  at  Los  Angeles  are  developing  a  package  called  ParPre 
which  is  a  collection  of  parallel  preconditioners  for  iterative  solution 
methods  for  linear  systems  of  equations.  ScaLAPACK,  P_ARPACK, 
CAPSS,  and  ParPre  have  been  placed  in  the  public  domain  and  are 
accessible  via  the  National  HPCC  Software  Exchange. 

http : / /www . netlib . org/scalapack/ 

The  research  is  leading  to  a  number  of  important  new  software 
tools  and  standards.  Recognizing  that  a  message  passing  standard  was 
necessary  to  ensure  the  easy  portability  of  the  prototype  libraries,  the 
project  initiated  and  promoted  the  development  of  the  MPI  message 
passing  interface  [4],  as  well  as  the  HPF  [5]  standard.  Standards  specific 
to  parallel  linear  algebra  are  also  being  developed. 
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2  Summary  of  Most  Important  Results 

The  official  release  of  the  PBLAS,  version  2.0,  was  announced  in  FY 
1999.  This  new  release  of  the  PBLAS  will  be  included  in  the  next 
release  of  ScaLAPACK,  projected  for  the  year  2000.  Also  included  in 
the  next  release  of  ScaLAPACK  will  be  the  parallel  divide  and  conquer 
symmetric  eigensolver  that  was  developed  in  FY  1998. 

SuperLU  is  a  supernodal  version  of  sparse  Gaussian  elimination 
and  is  currently  one  of  the  two  fastest  serial  implementations  of  sparse 
Gaussian  elimination  (on  some  test  problems  it  wins,  on  others  an¬ 
other  code  is  faster),  and  the  fastest  parallel  implementation.  Serial 
SuperLU,  multi-threaded  SuperLU.MT,  and  distributed-memory  Su- 
perLU_DIST  are  available  on  netlib. 

We  have  a  prototype  running  of  a  new  algorithm,  which  may  be 
the  ultimate  solution  for  the  symmetric  eigenproblem  on  both  parallel 
and  serial  machines.  This  algorithm  has  been  incorporated  into  the 
symmetric  eigenproblems  of  LAPACK,  version  3.0,  and  will  soon  be 
propogated  into  the  SVD,  and  the  SVD-based  least  squares  solver.  We 
expect  to  also  propagate  this  algorithm  into  ScaLAPACK. 

ARPACK++:  A  C++  interface  has  been  developed  for  ARPACK. 
This  package  provides  templates  for  utilizing  ARPACK  in  an  object 
oriented  environment.  All  of  the  Fortran  templates  provided  in  the 
EXAMPLES  directory  are  available  in  a  high  level  form  that  requires 
little  more  than  a  matrix  definition  and  specification  of  which  eigen¬ 
values  to  compute  from  the  user.  Shift-Invert  spectral  transformation 
modes  incorporate  and  use  the  SuperLU  sparse  factorization  software 
mentioned  previously  in  this  report.  This  software  is  complete  and  is 
undergoing  Beta-test  now.  Further  testing  and  refinement  should  be 
completed  within  the  next  six  months.  This  will  then  be  made  a  part 
of  the  regular  ARPACK  distribution  on  completion. 

We  have  continued  developing  a  blocked  out-of-core  variant  of  the 
implicitly  restarted  Arnoldi  method,  as  well  as  improved  spectral  trans¬ 
formation  strategies  and  deflation  techniques  for  large  sparse  eigenvalue 
problems. 

Our  work  has  been  adopted  by  Mathworks  and  forms  the  basis  for 
the  new  eigs  command  in  Matlab  for  sparse  eigenvalue  computation. 

We  have  completed  an  ARPACK  users  guide  that  is  currently  avail¬ 
able  on  the  web: 
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R.B.  Lehoucq,  D.C.  Sorensen,  and  C.  Yang, 

ARPACK  Users  Guide:  Solution  of  Large  Scale  Eigenvalue  Problems 
with  Implicitly  Restarted  Arnold!  methods,  142  pages,  SIAM 
Publications  (1998) 

A  user’s  guide  for  ARPACK++  has  been  completed  and  is  currently 
available  from  http://wwWoCaam.rice.edu/software/ARPACK, 

F.M.  Gomes  and  D.C.  Sorensen, 

ARPACK++  :  A  C++  implementation  of  the  ARPACK  eigenvalue  package 
(draft  date  June  1997). 

The  main  effort  for  the  CAPSS  project  was  directed  towards  de¬ 
veloping  a  scalable  “Domain-separator  ICCG”  by  for  large-scale  mul¬ 
tiprocessors.  This  is  a  major  development  and  integration  effort  and 
involves  adapting  several  components  of  CAPSS  and  MFACT  as  well 
as  implementing  new  algorithms.  For  example,  we  will  replace  numeric 
kernels  for  multifrontal  factorization  (in  CAPSS)  by  a  new  tree-update 
scheme  with  flexible  matrix  forms.  Likewise,  for  applying  the  precon¬ 
ditioner  we  will  modify  SI  to  work  with  our  new,  flexible  matrix  forms. 
Furthermore,  we  will  add  a  post-processing  step  to  drop  small  elements. 

The  plans  for  our  object-oriented  framework  for  grid  solvers  fo¬ 
cused  on  completing  the  design  and  implementation  of  the  factorization 
phases  as  part  of  the  object-oriented  framework,  to  be  followed  by  the 
design  and  implementation  of  the  grid-based  preconditioner  for  itera¬ 
tive  methods,  and  then  demonstrating  the  capabilities  of  the  object- 
oriented  framework  on  various  grid-based  applications. 

Plans  for  the  interactive  framework  DLab  call  for  expanding  its 
repertoire  of  operations  to  include  support  for  sparse  matrix  computa¬ 
tions  and  fast  Fourier  transforms.  Substantial  refinements  are  planned 
for  DLab’s  scheduler,  resource  monitor,  and  its  performance  prediction 
and  lazy  evaluation  capabilities.  This  work  was  presented  in  a  min¬ 
isymposium  at  the  SIAM  National  Meeting  in  Atlanta  in  May,  1999. 

The  ParPre  library  now  comprises  Schwarz  preconditioners,  Schur- 
complement  domain  decomposition  methods,  block  SSOR/ILU  precon¬ 
ditioners,  and  V-cycle  multilevel  methods.  The  multilevel  methods  are 
both  parallel  multi-colour  ILU,  and  algebraic  multigrid  type  methods. 

The  code  is  being  maintained  to  keep  it  compatible  with  the  Petsc  li¬ 
brary,  and  we  are  doing  further  research  into  the  multilevel  methods. 

Web  site  for  the  ParPre  project: 
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http :  //www .  cs .  utk .  edu/'e  i  j  khout  /  parpr  e .  html 

A  manual  for  ParPre  as  well  as  a  paper  on  the  design  principles 
have  been  published  [3,  2]. 

We  have  rewritten  some  of  the  internals  of  ParPre  to  reflect  changes 
in  the  design  of  the  PETSc  library,  which  we  use.  The  functionality  of 
the  Block  SSOR  method  has  been  expanded  so  that  it  can  reproduce 
various  block  factorizations,  including  an  exact  factorization. 

We  have  identified  several  aspects  of  the  algebraic  multilevel  method 
in  which  the  parallel  method  is  fundamentally  different  from  earlier 
sequential  ones.  We  are  investigating  these,  and  update  the  code  to 
reflect  the  new  insights. 

We  have  begun  research  into  the  existence  question  of  incomplete 
factorizations.  Such  sequential  methods  are  crucial  as  local  components 
of  various  types  of  domain  decomposition  methods. 

We  will  continue  the  further  development  of  algebraic  multilevel 
methods  and  perform  extensive  testing  on  them.  In  addition,  we  intend 
to  supply  a  proof  of  the  condition  number  reduction  of  the  method, 
analogous  to  such  proofs  as  in  [1], 

ATLAS  was  extended  in  FY  1998  to  support  the  matrix- vector  mul¬ 
tiply  DGEMV,  and  will  eventually  generate  all  Level  3  BLAS  directly, 
as  well  as  providing  complex  data  types.  Much  of  the  technology  and 
approach  developed  here  can  be  applied  to  the  other  Level  3  BLAS 
and  the  general  strategy  can  have  an  impact  on  basic  linear  algebra 
operations  in  general  and  may  be  extended  to  other  important  kernel 
operations. 

Another  avenue  of  research  for  ATLAS  involves  sparse  algorithms. 
The  fundamental  building  block  of  iterative  methods  is  the  sparse  ma¬ 
trix  times  dense  vector  multiply.  This  work  should  leverage  the  present 
research  (in  particular,  make  use  of  the  dense  matrix- vector  multiply). 
The  present  work  uses  compile-time  adaptation  of  software.  Since  ma¬ 
trix  vector  multiply  may  be  called  literally  thousands  of  times  dur¬ 
ing  the  course  of  an  iterative  method,  we  plan  to  investigate  run-time 
adaptation  as  well.  These  run-time  adaptations  could  include  matrix 
dependent  transformations  [6],  as  well  as  specific  code  generation.  For 
further  details,  please  refer  to  the  following  URL: 

http : //www . net lib . org/atlas/ 

The  BLAS  Technical  Forum  meetings  continued  through  1998.  On 
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April  27-29,  1998,  the  meeting  was  held  at  SGI/Cray  Research  in  Ea¬ 
gan,  Minnesota.  The  next  meeting  was  held  at  NIST  in  Washington, 
D.C.  on  October  8-9,  1998.  And  on  December  14-16,  1998,  the  meeting 
was  held  at  LBL  in  Berkeley,  California.  The  final  meeting  occurred 
on  March  16-18,  1999,  at  the  Ramada.  Inn  and  Suites  in  Oak  Ridge, 
TN. 

Refer  to  the  BLAS  Technical  Forum  homepage 

http : //www.netlib .org/utk/papers/blast-f orum.html 

for  detailed  minutes  from  each  of  the  meetings  and  a  draft  of  the  doc¬ 
ument  for  the  BLAS  Standard.  Reference  implementations  of  the  pro¬ 
posed  routines  are  being  written.  The  final  draft  of  the  BLAS  Standard 
will  be  available  in  the  Summer,  2000. 

And  finally,  a  large  team  of  experts  is  working  on  a  book  of  Eigen- 
templates,  which  is  designed  to  help  the  user  find  the  best  eigenvalue 
algorithm  available  for  a  particular  problem  and  computer.  The  book 
is  entering  the  final  stage  of  the  editing,  and  we  are  expecting  to  have 
the  eigentemplate  book  published  by  SIAM  in  the  Fall,  1999.  The 
current  draft  of  the  book  is  available  via  the  URL: 

http : //www . ms . uky . edu/Tai/ET/content s .html 
Technology  Transfer 

The  ScaLAPACK  library  for  dense  linear  algebra  computations  is  in 
the  process  of  transition  to  the  commercial  marketplace.  ScaLAPACK 
has  been  incorporated  into  several  commercial  packages,  including  the 
NAG  Parallel  Library,  IBM  Parallel  ESSL,  and  SGI  Cray  Scientific 
Software  Library,  and  is  being  integrated  into  the  VNI IMSL  Numerical 
Library,  as  well  as  software  libraries  for  Fujitsu,  Hewlett-Packard/Convex, 
Hitachi,  and  NEC. 

The  ScaLAPACK  library  has  become  an  official  release  for  ASCI 
Red’s  operating  system.  Each  build  of  the  operating  system  will  be 
validated  against  ScaLAPACK  and  each  new  compiler  and  operating 
system  drop  will  contain  an  automatically  generated  fresh  ScaLAPACK 
build.  It  will  be  in  /usr/lib,  as  standard  as  the  BLAS. 

The  ScaLAPACK  generalized  Hermitian  eigensolver  is  being  en¬ 
hanced  and  has  being  incorporated  into  the  electron  structure  MP- 
Quest  project  at  Sandia  National  Laboratory.  It  is  approximately  ten 
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times  faster  than  the  previously  existing  eigensolver  in  MP-Quest,  and 
will  result  in  an  approximate  90an  entry  for  the  Gordon  Bell  competi¬ 
tion  based  on  this  work.  Performance  enhancements  to  the  eigencode 
will  be  propagated  into  the  next  ScaLAPACK  release. 

The  point  of  contact  for  ScaLAPACK  is  Susan  Blackford,  (865) 
974-5886,  susan@cs.utk.edu. 

The  ARPACK  work  has  been  adopted  by  Mathworks  and  forms  the 
basis  for  the  new  eigs  command  in  Matlab  for  sparse  eigenvalue  compu¬ 
tation.  We  have  entered  into  a  collaboration  with  Roldan  Pozo  of  NIST 
to  construct  an  interface  between  ARPACK++  and  TNT.  This  will 
provide  the  ability  to  define  and  work  with  matrices  in  ARPACK++ 
using  TNT  matrix  classes.  We  have  also  established  a  research  rela¬ 
tionship  with  Sandia  Albuquerque.  Parallel  ARPACK  has  been  linked 
to  AZTEC  (Sandia’s  parallel  iterative  linear  solver  package)  and  is 
being  applied  to  a  stability  analysis  of  thin  film  reactors.  The  point 
of  contact  for  ARPACK,  ARPACK++,  and  P .ARPACK  is  Dr.  Dan 
Sorensen,  (713)  527-4805,  sorensen@rice.edu. 

CAPSS  has  been  used  in  solving  numerous  problems  in  structural 
mechanics,  including  shearing  in  foam-like  materials  and  crack  propa¬ 
gation  in  extrusion  processes.  CAPSS  has  also  been  used  in  the  study 
of  fluid-flows  using  higher-order  finite-element  methods.  MFACT  and 
CAPSS  are  being  used  to  solve  large-scale  complex  systems  from  elec¬ 
tromagnetic  applications  at  Northop-Grummann.  The  point  of  contact 
for  CAPSS  is  Dr.  Mike  Heath,  (217)  333-6268,  heath@ncsa.uiuc.edu. 

3  Publications  and  Technical  Reports 

1.  A  New  Deflation  Criterion  for  the  QR  Algorithm ,  M.  Ahues  and 
F.  Tisseur,  University  of  Tennessee  Technical  Report,  CS-97-353, 
1997  (also  LAPACK  Working  Note  122). 

2.  Performance  Improvements  to  LAPACK  for  the  Cray  Scientific 
Library ,  E.  Anderson  and  M.  Fahey,  University  of  Tennessee  Tech¬ 
nical  Report,  CS-97-359,  1997  (also  LAPACK  Working  Note  126). 

3.  A  Test  Matrix  Collection  for  Non-Hermitian  Eigenvalue  Prob¬ 
lems ,  Z.  Bai,  D.  Day,  J.  Demmel,  and  J.  Dongarra,  University 
of  Tennessee  Technical  Report,  CS-97-355,  1997  (also  LAPACK 
Working  Note  123). 
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Note  137). 

9.  ScaLAPACK  Evaluation  and  Performance  at  the  DoD  MSRCs ,  L. 
S.  Blackford  and  R.  C.  Whaley,  University  of  Tennessee  Technical 
Report,  CS-98-388,  1998  (also  LAPACK  Working  Note  136). 

10.  Design  of  a  Library  of  Parallel  Preconditioners ,  T.  F.  Chan  and 
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97-58,  December  1997. 

11.  Wavelet  Sparse  Approximate  Inverse  Preconditioners ,  Tony  F. 
Chan,  W.  P.  Tang,  and  W.  L.  Wan,  BIT,  37:3,  1997,  pp.  644-660. 
(also  UCLA  Cam  report  96-33,  September  1996) 

12.  Galerkin  Projection  Methods  for  Solving  Multiple  Linear  Systems , 
Tony  F.  Chan  and  Michael  K.  Ng,  UCLA  Cam  report  96-31, 
September  1996,  Submitted  to  SIAM  J.  Sci.  Comp. 
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13.  Boundary  Treatments  for  Multilevel  Methods  on  Unstructured  Meshes , 
Tony  F.  Chan,  Susie  Go  and  Jun  Zou,  UCLA  Cam  report  96-3, 
September  1996,  To  appear  in  SIAM  J.  Sci.  Comp. 

14.  Multilevel  Domain  Decomposition  and  Multigrid  Methods  for  Un¬ 
structured  Meshes:  Algorithms  and  Theory ,  Tony  F.  Chan,  Susie 
Go,  and  Jun  Zou,  In  Proc.  of  8th  Int’l  Conf.  on  Domain  De¬ 
composition  Methods,  J.  Wiley,  Beijing,  May  1995,  pp.  159-176. 
(also  UCLA  Cam  report  95-24,  May  1995). 

15.  Geometric  Spectral  Partitioning ,  Tony  F.  Chan,  John  R.  Gilbert, 
and  Shang-Hua  Teng,  UCLA  Cam  report  95-5,  January  1995, 
Submitted  to  SIAM  J.  Sci.  Comp. 

16.  A  New  Parallel  Matrix  Multiplication  Algorithm  on  Distributed- 
Memory  Concurrent  Computers ,  J.  Choi,  University  of  Tennessee 
Technical  Report,  CS-97-369,  1997  (also  LAPACK  Working  Note 
129). 

17.  Implementation  in  ScaLAPACK  of  Divide  and  Conquer  Algo¬ 
rithms  for  Banded  and  Tridiagonal  Linear  Systems,  A.  Cleary 
and  J.  Dongarra,  University  of  Tennessee  Technical  Report,  CS- 
97-358,  1997  (also  LAPACK  Working  Note  125). 

18.  Packed  Storage  Extensions  for  ScaLAPACK,  E.  D’Azevedo  and  J. 
Dongarra,  University  of  Tennessee  Technical  Report,  CS-98-385, 
1998  (also  LAPACK  Working  Note  135).  Submitted  to  Parallel 
Computing. 

19.  Accurate  SVDs  of  Structured  Matrices,  J.  Demmel,  University 
of  Tennessee  Technical  Report,  CS-97-369,  1997  (also  LAPACK 
Working  Note  129). 

20.  SuperLU  Users’  Guide,  J.  Demmel,  J.  Gilbert,  and  X.  Li,  in 
preparation,  1997. 

21.  An  Asynchronous  Parallel  Supernodal  Algorithm  for  Sparse  Gaus¬ 
sian  Elimination ,  J.  Demmel,  J.  Gilbert,  and  X.  Li,  University 
of  Tennessee  Technical  Report,  CS-97-358,  1997  (also  LAPACK 
Working  Note  124),  to  appear  in  SIAM  J.  Mat.  Anal.  Appl., 
1997. 
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22.  Computing  the  Singular  Value  Decomposition  with  High  Rela¬ 
tive  Accuracy ,  J.  Demmel,  M.  Gu,  S.  Eisenstat,  I.  Slapnicar,  K. 
Veselic,  and  Z.  Drmac,  University  of  Tennessee  Technical  Report, 
CS-97-348,  1997  (also  LAPACK  Working  Note  119). 

23.  A  Supernodal  Approach  to  Sparse  Partial  Pivoting ,  J.  Demmel, 
S.  Eisenstat,  J.  Gilbert,  X.  Li,  and  J.  W.  H.  Liu,  UC  Berke¬ 
ley  Computer  Science  Division  Technical  Report,  UCB/ /CSD-95- 
883,  September,  1995,  to  appear  in  SIAM  J.  Mat.  Anal.  Appl. 

24.  Scheduling  Block-Cyclic  Array  Redistribution ,  F.  Desprez,  J.  Don- 
garra,  A.  Petitet,  C.  Randriamaro,  and  Y.  Robert,  University 
of  Tennessee  Technical  Report,  CS-97-349,  1997  (also  LAPACK 
Working  Note  120). 

25.  A  New  0(n2)  Algorithm  for  the  Symmetric  Tridiagonal  Eigen¬ 
value/Eigenvector  Problem ,  I.  Dhillon,  PhD  thesis,  Computer  Sci¬ 
ence  Division,  Department  of  Electrical  Engineering  and  Com¬ 
puter  Science,  University  of  California,  Berkeley,  CA,  1997. 

26.  Application  of  a  New  Algorithm  for  the  Symmetric  Eigenproblem 
to  Computational  Quantum  Chemistry ,  I.  Dhillon,  G.  Fann,  and 
B.  Parlett,  Proceedings  of  the  Eighth  SIAM  Conference  on  Par¬ 
allel  Processing  for  Scientific  Computing,  SIAM,  March,  1997. 

27.  Computing  the  eigenvectors  of  a  symmetric  tridiagonal  matrix ,  I. 
S.  Dhillon,  and  B.  N.  Parlett,  in  preparation,  1997. 

28.  The  Design  and  Implementation  of  the  Parallel  Out-of-core  ScaLA- 
PACK  LU )  QR,  and  Cholesky  Factorization  Routines ,  J.  J.  Don- 
garra  and  E.  F.  D’Azevedo,  Department  of  Computer  Science 
Technical  Report,  CS-97-347,  University  of  Tennessee,  January, 
1997.  (also  LAPACK  Working  Note  118). 

29.  On  Factorizations  of  the  Hessenberg  matrices  arising  from  Poly¬ 
nomial  Iterative  Methods ,  Victor  Eijkhout,  UCLA  CAM  report 
96-44,  October  1996,  Submitted  to  Numerical  Linear  Algebra  and 
its  Applications. 

30.  Residual  Smoothing  for  Complex  Symmetric  Systems ,  V.  Eijkhout, 
UCLA  Department  of  Mathematics  CAM  Report  97-59,  Decem¬ 
ber  1997. 
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31.  ParPre:  A  Parallel  Preconditioners  Package  reference  manual  for 
version  2.0.17 ,  V.  Eijkhout  and  T.  F.  Chan,  UCLA  Department 
of  Mathematics  CAM  Report  97-24,  June  1997. 

32.  Parallel  Direct  Methods  for  Sparse  Linear  Systems ,  Heath,  M.  T., 
in  Parallel  Numerical  Algorithms,  ed.  by  D.  E.  Keyes,  A.  Sameh, 
and  V.  Venkatakrishnan,  Kluwer  Academic  Publishers,  Boston, 
1997,  pp.  55-90. 

33.  Scientific  Computing:  An  Introductory  Survey ,  Heath,  M.  T., 
McGraw-Hill,  New  York,  1997. 

34.  Performance  of  a  Fully  Parallel  Sparse  Solver ,  Heath,  M.  T.,  and 
P.  Raghavan,  Int.  J.  Supercomput.  Appl.  High  Perf.  Comput., 
Vol.  11,  No.  1,  1997,  pp,  49-64. 

35.  Performance  of  Parallel  Sparse  Triangular  Solution ,  Heath,  M.  T. 
and  P.  Raghavan,  In  Algorithms  for  Parallel  Processing ,  Vol.  11, 
Eds.,  M.  T.  Heath  and  A.  Ranade  and  R.  S.  Schreiber,  Spring- 
Verlag,  New  York,  1998,  pp.  289-306. 

36.  A  Parallel  Implementation  of  the  Nonsymmetric  QR  Algorithm 
for  Distributed  Memory  Architectures ,  G.  Henry,  D.  Watkins,  and 
J.  Dongarra,  University  of  Tennessee  Technical  Report,  CS-97- 
352,  1997  (also  LAPACK  Working  Note  121). 

37.  ARPACK  USERS  GUIDE :  Solution  of  Large  Scale  Eigenvalue 
Problems  by  Implicitly  Restarted  Arnoldi  Methods ,  R.  B.  Lehoucq, 
D.  C.  Sorensen,  C.  Yang,  142  pages,  SIAM  Publications,  Philadel¬ 
phia,  PA,  1998. 

38.  Sparse  Gaussian  Elimination  on  High  Performance  Computers , 
Li,  X.,  University  of  Tennessee  Technical  Report,  CS-97-368,  1997 
(also  LAPACK  Working  Note  127). 

39.  P- ARPACK:  An  Efficient  Portable  Large  Scale  Eigenvalue  Pack¬ 
age  for  Distributed  Memory  Parallel  Architectures ,  K.J.  Maschhoff 
and  D.C.  Sorensen,  Lectures  Notes  in  Computer  Science  1184,  Ap¬ 
plied  Parallel  Computing,  J.  Wasniewski  and  J.  Dongarra  and  K. 
Madsen  and  D.  Olesen,  eds.,  Springer  Verlag,  New  York,  pp.  478- 
486,  1996  (also  Rice  University  Technical  Report  TR96-33(caam)). 
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40.  A  Comparison  of  Computational  Complexities  of  HFEM  and  ABC 
Based  Finite  Element  Methods ,  Nasir,  M.  A.,  W.  C.  Chew,  P. 
Raghavan,  and  M.  T.  Heath,  J.  Electromagnetic  Waves  and  Ap¬ 
plications,  Vol.  11,  (1997)  pp.  1601-1617. 

41.  The  Performance  of  Greedy  Ordering  Heuristics,  Ng,  E.  G.,  and 
P.  Raghavan,  to  appear  in  SIAM  J.  Matrix  Anal.  Appl.,  1998. 

42.  A  new  class  of  preconditioners  for  large  scale  linear  systems  from 
interior  point  methods  for  linear  programming,  A.R.L.  Oliveira 
and  D.C.  Sorensen,  Rice  U.  CAAM-TR97-27,  1997,  (Submitted 
to  SIAM  J.  Optimization). 

43.  Computational  Experience  with  a  preconditioner  for  interior  point 
methods  for  linear  programming,  A.R.L.  Oliveira  and  D.C.  Sorensen, 
Rice  U.  CAAM-TR97-28,  November  1997,  (Submitted  to  SIAM 
J.  Optimization). 

44.  Fernando’s  Solution  to  Wilkinson’s  Problem:  An  Application  of 
Double  Factorization,  B.  N.  Parlett,  and  I.  S.  Dhillon,  Lin.  Alg. 
Appl.,  Volume  267,  pp.  247-279,  1997. 

45.  Algorithmic  Redistribution  Methods  for  Block  Cyclic  Decomposi¬ 
tions,  A.  Petitet,  University  of  Tennessee  Technical  Report,  CS- 
98-383,  1998  (also  LAPACK  Working  Note  133). 

46.  Efficient  Parallel  Triangular  Solution  with  Selective  Inversion ,  P. 
Raghavan,  Technical  Report  CS-95-314,  University  of  Tennessee, 
Dec  1995.  To  appear  in  Parallel  Processing  Letters,  1998. 

47.  Parallel  ordering  using  edge  contraction,  P.  Raghavan,  Parallel 
Computing,  Vol.  23,  No.  8,  1997,  pp.  1045-1067.  (also  University 
of  Tennessee  Technical  Report  CS-95-293,  May  1995.) 

48.  Sign-function  based  nonsymmetric  eigenroutine  for  ScaLAPACK, 
H.  Robinson,  Master’s  Thesis,  Mathematics  Dept.,  University  of 
California,  Berkeley  CA,  in  progress,  1997. 

49.  Minimization  of  a  Large  Scale  Quadratic  Function  Subject  to  a 
Spherical  Constraint,  D.C.  Sorensen,  SIAM  J.  Optimization,  Vol. 

7,  pp.  141-161,  1997.  (also  Rice  U.  CAAM-TR  94-27). 
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50.  New  approaches  to  large  scale  Eigenanalysis,  D.  C.  Sorensen,  in 
Computational  Science  for  the  21st  Century,  Bristeau,  Etgen, 
Fitzgibbon,  Lions,  Periaux  and  Wheeler,  eds.,  62-71,  John  Wiley 
and  Sons  Ltd,  Chichester,  England,  1997. 

51.  Implicitly  Restarted  Arnoldi/Lanczos  Methods  for  Large  Scale  Eigen¬ 
value  Calculations,  D.C.  Sorensen,  (invited  survey  paper),  in  Par¬ 
allel  Numerical  Algorithms  D.  E.  Keyes,  A.  Sameh,  and  V.  Venkatakr- 
ishnam,  eds.,  Kluwer,  Dordrecht,  pp.  119-166,  1996. 

52.  A  Truncated  RQ-iteration  for  Large  Scale  Eigenvalue  Calcula¬ 
tions,  D.  C.  Sorensen  and  C.  Yang,  Rice  University  CAAM-TR96- 
06,  May,  1996.  (To  appear  in  SIAM  J.  Matrix  Analysis  and  Ap¬ 
plications). 

53.  Truncated  QZ  methods  for  Large  Scale  Generalized  Eigenvalue 
Problems,  D.C.  Sorensen,  Rice  U.  CAAM-TR98-01,  1998,  (Sub¬ 
mitted  to  Electronic  Transactions  on  Numerical  Analysis). 

54.  Execution  Time  of  Symmetric  Eigensolvers,  K.  Stanley,  PhD  the¬ 
sis,  Computer  Science  Division,  Department  of  Electrical  Engi¬ 
neering  and  Computer  Science,  University  of  California,  Berkeley, 

CA,  1997. 

55.  Parallelizing  the  Divide  and  Conquer  Algorithm  for  the  Symmet¬ 
ric  Tridiagonal  Eigenvalue  Problem  on  Distributed  Memory  Ar¬ 
chitectures,  F.  Tisseur  and  J.  Dongarra,  University  of  Tennessee 
Technical  Report,  CS-98-382,  1998  (also  LAPACK  Working  Note 
132). 

56.  High  Performance  Linear  Algebra  Package  -  LAPACK90,  J.  Wasniewski 
and  J.  Dongarra,  University  of  Tennessee  Technical  Report,  CS- 
98-384,  1998  (also  LAPACK  Working  Note  134). 

57.  Automatically  Tuned  Linear  Algebra  Software,  R.  C.  Whaley  and 
J.  Dongarra,  University  of  Tennessee  Technical  Report,  CS-97- 
366,  1997  (also  LAPACK  Working  Note  131). 

58.  Accelerating  the  Arnoldi  Iteration  -  Theory  and  Practice,  C.  Yang, 

PhD  thesis,  Dept.  Computational  and  Applied  Math,  Rice  Uni¬ 
versity,  1998.  (also  Rice  U.  CAAM-TR98-06). 
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59.  Convergence  Analysis  of  an  Inexact  Truncated  RQ-Iteration ,  C. 
Yang,  Rice  University  CAAM-TR98-07,  1998.  (Submitted  to 
Electronic  Transactions  on  Numerical  Analysis). 

4  Participating  Scientific  Personnel  Earning  Ad¬ 
vanced  Degrees 

•  Inderjit  Dhillon,  Graduate  Student,  University  of  California  at 
Berkeley,  PhD  1997. 

•  Jennifer  Finger,  Graduate  Student,  University  of  Tennessee  at 
Knoxville,  MS  1997. 

•  Thomas  Harrold,  Graduate  student,  University  of  Tennessee  at 
Knoxville,  MS  1997. 

•  Jeff  Horner,  Graduate  student.  University  of  Tennessee  at  Knoxville, 
MS  1999. 

•  Song  Jin,  Graduate  Student,  University  of  Tennessee  at  Knoxville, 
MS  1999. 

•  Youngbae  Kim,  Graduate  student,  University  of  Tennessee  at 
Knoxville,  PhD  1996. 

•  A.  Oliveira,  Graduate  student,  Rice  University,  PhD  1997. 

•  Antoine  Petitet,  Graduate  student,  University  of  Tennessee  at 
Knoxville,  PhD  1996. 

•  Huan  Ren,  Graduate  Student,  University  of  California  at  Berke¬ 
ley,  PhD  1997. 

•  Howard  Robinson,  Graduate  Student,  University  of  California  at 
Berkeley,  MS  1997. 

•  M.  Rojas,  Graduate  student,  Rice  University,  PhD  1998. 

•  Ken  Stanley,  Graduate  student,  University  of  California  at  Berke¬ 
ley,  PhD  1997. 

•  C.  Yang,  Graduate  student,  Rice  University,  PhD  1998. 
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